1
برق و الکترونیک::
بردار گرادیان
The gradient vector, noted ∇θMSE(θ), contains all the partial derivatives of the cost function (one for each model parameter).
Gradient vector of the cost function
Once you have the gradient vector, which points uphill, just go in the opposite direc‐ tion to go downhill.
This is where the learning rate η comes into play:6 multiply the gradient vector by η to determine the size of the downhill step (Equation 4-7).
A simple solu‐ tion is to set a very large number of iterations but to interrupt the algorithm when the gradient vector becomes tiny-that is, when its norm becomes smaller than a tiny number ϵ (called the tolerance)-because this happens when Gradient Descent has (almost) reached the minimum.
واژگان شبکه مترجمین ایران